Performance Evaluation of OpenMP Applications with Nested Parallelism
نویسندگان
چکیده
Many existing OpenMP systems do not su ciently implement nested parallelism. This is supposedly because nested parallelism is believed to require a signi cant implementation e ort, incur a large overhead, or lack applications. This paper demonstrates Omni/ST, a simple and e cient implementation of OpenMP nested parallelism using StackThreads/MP, which is a ne-grain thread library. Thanks to StackThreads/MP, OpenMP parallel constructs are simply mapped onto thread creation primitives of StackThreads/MP, yet they are e ciently managed with a xed number of threads in the underlying thread package (e.g., Pthreads). Experimental results on Sun Ultra Enterprise 10000 with up to 60 processors show that overhead imposed by nested parallelism is very small (1-3% in ve out of six applications, and 8% for the other), and there is a signi cant scalability bene t for applications with nested parallelism.
منابع مشابه
Support and Efficiency of Nested Parallelism in OpenMP Implementations
Nested parallelism has been a major feature of OpenMP since its very beginnings. As a programming style, it provides an elegant solution for a wide class of parallel applications, with the potential to achieve substantial utilization of the available computational resources, in situations where outer-loop parallelism simply can not. Notwithstanding its significance, nested parallelism support w...
متن کاملNested Parallelism in the OMPi OpenMP/C Compiler
This paper presents a new version of the OMPi OpenMP C compiler, enhanced by lightweight runtime support based on user-level multithreading. A large number of threads can be spawned for a parallel region and multiple levels of parallelism are supported efficiently, without introducing additional overheads to the OpenMP library. Management of nested parallelism is based on an adaptive distributi...
متن کاملRuntime Adjustment of Parallel Nested Loops
OpenMP allows programmers to specify nested parallelism in parallel applications. In the case of scientific applications, parallel loops are the most important source of parallelism. In this paper we present an automatic mechanism to dynamically detect the best way to exploit the parallelism when having nested parallel loops. This mechanism is based on the number of threads, the problem size, a...
متن کاملA Microbenchmark Study of OpenMP Overheads under Nested Parallelism
In this work we present a microbenchmark methodology for assessing the overheads associated with nested parallelism in OpenMP. Our techniques are based on extensions to the well known EPCC microbenchmark suite that allow measuring the overheads of OpenMP constructs when they are effected in inner levels of parallelism. The methodology is simple but powerful enough and has enabled us to gain int...
متن کاملPortable Support and Exploitation of Nested Parallelism in OpenMP
In this paper, we present an alternative implementation of the NANOS OpenMP runtime library (NthLib) that targets portability and efficient support of multiple levels of parallelism. We have implemented the runtime libraries of available opensource OpenMP compilers on top of NthLib, reducing thus their overheads and providing them with inherent support for nested parallelism. In addition, we pr...
متن کامل